{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "359697d5",
   "metadata": {},
   "source": [
    "# LangChain Cookbook Part 2: Use Cases👨‍🍳👩‍🍳"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11d788b0",
   "metadata": {},
   "source": [
    "*This cookbook is based on the [LangChain Conceptual Documentation](https://docs.langchain.com/docs/)*\n",
    "\n",
    "**Goals:**\n",
    "\n",
    "1. Inspire you to build\n",
    "2. Provide an introductory understanding of the main use cases of LangChain via [ELI5](https://www.dictionary.com/e/slang/eli5/#:~:text=ELI5%20is%20short%20for%20%E2%80%9CExplain,a%20complicated%20question%20or%20problem.) examples and code snippets. For an introduction to the *fundamentals* of LangChain check out [Cookbook Part 1: Fundamentals](https://github.com/gkamradt/langchain-tutorials/blob/main/LangChain%20Cookbook%20Part%201%20-%20Fundamentals.ipynb).\n",
    "\n",
    "**LangChain Links:**\n",
    "* [LC Conceptual Documentation](https://docs.langchain.com/docs/)\n",
    "* [LC Python Documentation](https://python.langchain.com/en/latest/)\n",
    "* [LC Javascript/Typescript Documentation](https://js.langchain.com/docs/)\n",
    "* [LC Discord](https://discord.gg/6adMQxSpJS)\n",
    "* [www.langchain.com](https://langchain.com/)\n",
    "* [LC Twitter](https://twitter.com/LangChainAI)\n",
    "\n",
    "\n",
    "### **What is LangChain?**\n",
    "> LangChain is a framework for developing applications powered by language models.\n",
    "*[Source](https://blog.langchain.dev/announcing-our-10m-seed-round-led-by-benchmark/#:~:text=LangChain%20is%20a%20framework%20for%20developing%20applications%20powered%20by%20language%20models)*\n",
    "\n",
    "**TLDR**: LangChain makes the complicated parts of working & building with AI models easier. It helps do this in two ways:\n",
    "\n",
    "1. **Integration** - Bring external data, such as your files, other applications, and api data, to your LLMs\n",
    "2. **Agency** - Allow your LLMs to interact with its environment via decision making. Use LLMs to help decide which action to take next\n",
    "\n",
    "### **Why LangChain?**\n",
    "1. **Components** - LangChain makes it easy to swap out abstractions and components necessary to work with language models.\n",
    "\n",
    "2. **Customized Chains** - LangChain provides out of the box support for using and customizing 'chains' - a series of actions strung together.\n",
    "\n",
    "3. **Speed 🚢** - This team ships insanely fast. You'll be up to date with the latest LLM features.\n",
    "\n",
    "4. **Community 👥** - Wonderful [discord](https://discord.gg/6adMQxSpJS) and community support, meet ups, hackathons, etc.\n",
    "\n",
    "Though LLMs can be straightforward (text-in, text-out) you'll quickly run into friction points that LangChain helps with once you develop more complicated applications.\n",
    "\n",
    "### **Main Use Cases**\n",
    "\n",
    "* **Summarization** - Express the most important facts about a body of text or chat interaction\n",
    "* **Question and Answering Over Documents** - Use information held within documents to answer questions or query\n",
    "* **Extraction** - Pull structured data from a body of text or an user query\n",
    "* **Evaluation** - Understand the quality of output from your application\n",
    "* **Querying Tabular Data** - Pull data from databases or other tabular source\n",
    "* **Code Understanding** - Reason about and digest code\n",
    "* **Interacting with APIs** - Query APIs and interact with the outside world\n",
    "* **Chatbots** - A framework to have a back and forth interaction with a user combined with memory in a chat interface\n",
    "* **Agents** - Use LLMs to make decisions about what to do next. Enable these decisions with tools.\n",
    "\n",
    "Want to see live examples of these use cases? Head over to the [LangChain Project Gallery](https://github.com/gkamradt/langchain-tutorials)\n",
    "\n",
    "#### **Authors Note:**\n",
    "\n",
    "* This cookbook will not cover all aspects of LangChain. It's contents have been curated to get you to building & impact as quick as possible. For more, please check out [LangChain Technical Documentation](https://python.langchain.com/en/latest/index.html)\n",
    "* This notebook assumes is that you've seen part 1 of this series [Fundamentals](https://github.com/gkamradt/langchain-tutorials/blob/main/LangChain%20Cookbook%20Part%201%20-%20Fundamentals.ipynb). This notebook is focused on what to do and how to apply those fundamentals.\n",
    "* You'll notice I repeat import statements throughout the notebook. My intention is to lean on the side of clarity and help you see the full code block in one spot. No need to go back and forth to see when we imported a package.\n",
    "* We use the default models throughout the notebook, at the time of writing they were davinci-003 and gpt-3.5-turbo. You would no doubt get better results with GPT4\n",
    "\n",
    "Let's get started"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e323fb6",
   "metadata": {},
   "source": [
    "Throughout this tutorial we will use OpenAI's various [models](https://platform.openai.com/docs/models/overview). LangChain makes it easy to [subsistute LLMs](https://langchain.com/integrations.html#:~:text=integrations%20LangChain%20provides.-,LLMs,-LLM%20Provider) so you can BYO-LLM if you want"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e9815081",
   "metadata": {
    "hide_input": false
   },
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "import os\n",
    "\n",
    "load_dotenv()\n",
    "\n",
    "openai_api_key = os.getenv('OPENAI_API_KEY', 'YourAPIKeyIfNotSet')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "dcd3587c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>.container { width:90% !important; }</style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Run this cell if you want to make your display wider\n",
    "from IPython.display import display, HTML\n",
    "display(HTML(\"<style>.container { width:90% !important; }</style>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05bb564d",
   "metadata": {},
   "source": [
    "# LangChain Use Cases"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1bbdb1dc",
   "metadata": {},
   "source": [
    "## Summarization\n",
    "\n",
    "One of the most common use cases for LangChain and LLMs is summarization. You can summarize any piece of text, but use cases span from summarizing calls, articles, books, academic papers, legal documents, user history, a table, or financial documents. It's super helpful to have a tool which can summarize information quickly.\n",
    "\n",
    "* **Deep Dive** - (Coming Soon)\n",
    "* **Examples** - [Summarizing B2B Sales Calls](https://www.youtube.com/watch?v=DIw4rbpI9ic)\n",
    "* **Use Cases** - Summarize Articles, Transcripts, Chat History, Slack/Discord, Customer Interactions, Medical Papers, Legal Documents, Podcasts, Tweet Threads, Code Bases, Product Reviews, Financial Documents\n",
    "\n",
    "### Summaries Of Short Text\n",
    "\n",
    "For summaries of short texts, the method is straightforward, in fact you don't need to do anything fancy other than simple prompting with instructions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0c292592",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenAI\n",
    "from langchain import PromptTemplate\n",
    "\n",
    "# Note, the default model is already 'text-davinci-003' but I call it out here explicitly so you know where to change it later if you want\n",
    "llm = OpenAI(temperature=0, model_name='text-davinci-003', openai_api_key=openai_api_key)\n",
    "\n",
    "# Create our template\n",
    "template = \"\"\"\n",
    "%INSTRUCTIONS:\n",
    "Please summarize the following piece of text.\n",
    "Respond in a manner that a 5 year old would understand.\n",
    "\n",
    "%TEXT:\n",
    "{text}\n",
    "\"\"\"\n",
    "\n",
    "# Create a LangChain prompt template that we can insert values to later\n",
    "prompt = PromptTemplate(\n",
    "    input_variables=[\"text\"],\n",
    "    template=template,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f539cb53",
   "metadata": {},
   "source": [
    "Let's let's find a confusing text online. *[Source](https://www.smithsonianmag.com/smart-news/long-before-trees-overtook-the-land-earth-was-covered-by-giant-mushrooms-13709647/)*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0df2cde6",
   "metadata": {},
   "outputs": [],
   "source": [
    "confusing_text = \"\"\"\n",
    "For the next 130 years, debate raged.\n",
    "Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.\n",
    "“The problem is that when you look up close at the anatomy, it’s evocative of a lot of different things, but it’s diagnostic of nothing,” says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.\n",
    "“And it’s so damn big that when whenever someone says it’s something, everyone else’s hackles get up: ‘How could you have a lichen 20 feet tall?’”\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03d31842",
   "metadata": {},
   "source": [
    "Let's take a look at what prompt will be sent to the LLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "406eb8a3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "------- Prompt Begin -------\n",
      "\n",
      "%INSTRUCTIONS:\n",
      "Please summarize the following piece of text.\n",
      "Respond in a manner that a 5 year old would understand.\n",
      "\n",
      "%TEXT:\n",
      "\n",
      "For the next 130 years, debate raged.\n",
      "Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.\n",
      "“The problem is that when you look up close at the anatomy, it’s evocative of a lot of different things, but it’s diagnostic of nothing,” says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.\n",
      "“And it’s so damn big that when whenever someone says it’s something, everyone else’s hackles get up: ‘How could you have a lichen 20 feet tall?’”\n",
      "\n",
      "\n",
      "------- Prompt End -------\n"
     ]
    }
   ],
   "source": [
    "print (\"------- Prompt Begin -------\")\n",
    "\n",
    "final_prompt = prompt.format(text=confusing_text)\n",
    "print(final_prompt)\n",
    "\n",
    "print (\"------- Prompt End -------\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a95e53d9",
   "metadata": {},
   "source": [
    "Finally let's pass it through the LLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "bc7e4b42",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "For 130 years, people argued about what Prototaxites was. Some thought it was a lichen, some thought it was a fungus, and some thought it was a tree. But no one could agree. It was so big that it was hard to figure out what it was.\n"
     ]
    }
   ],
   "source": [
    "output = llm(final_prompt)\n",
    "print (output)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "751c6359",
   "metadata": {},
   "source": [
    "This method works fine, but for longer text, it can become a pain to manage and you'll run into token limits. Luckily LangChain has out of the box support for different methods to summarize via their [load_summarize_chain](https://python.langchain.com/en/latest/use_cases/summarization.html).\n",
    "\n",
    "### Summaries Of Longer Text\n",
    "\n",
    "*Note: This method will also work for short text too*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3441484b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenAI\n",
    "from langchain.chains.summarize import load_summarize_chain\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e95b575c",
   "metadata": {},
   "source": [
    "Let's load up a longer document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6c33f9bb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "April 2008(This essay is derived from a talk at the 2008 Startup School.)About a month after we started Y Combinator we came up with the\n",
      "phrase that became our motto: Make something people want.  We've\n",
      "learned a lot since then, but if I were choosing now that's still\n",
      "the one I'd pick.\n"
     ]
    }
   ],
   "source": [
    "with open('data/PaulGrahamEssays/good.txt', 'r') as file:\n",
    "    text = file.read()\n",
    "\n",
    "# Printing the first 285 characters as a preview\n",
    "print (text[:285])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b489d2a2",
   "metadata": {},
   "source": [
    "Then let's check how many tokens are in this document. [get_num_tokens](https://python.langchain.com/en/latest/reference/modules/llms.html#langchain.llms.OpenAI.get_num_tokens) is a nice method for this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "5e0e8181",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 3970 tokens in your file\n"
     ]
    }
   ],
   "source": [
    "num_tokens = llm.get_num_tokens(text)\n",
    "\n",
    "print (f\"There are {num_tokens} tokens in your file\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bf8eda6",
   "metadata": {},
   "source": [
    "While you could likely stuff this text in your prompt, let's act like it's too big and needs another method.\n",
    "\n",
    "First we'll need to split it up. This process is called 'chunking' or 'splitting' your text into smaller pieces. I like the [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html) because it's easy to control but there are a [bunch](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html) you can try"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "25dd80dc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You now have 4 docs intead of 1 piece of text\n"
     ]
    }
   ],
   "source": [
    "text_splitter = RecursiveCharacterTextSplitter(separators=[\"\\n\\n\", \"\\n\"], chunk_size=5000, chunk_overlap=350)\n",
    "docs = text_splitter.create_documents([text])\n",
    "\n",
    "print (f\"You now have {len(docs)} docs intead of 1 piece of text\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e7547a3",
   "metadata": {},
   "source": [
    "Next we need to load up a chain which will make successive calls to the LLM for us. Want to see the prompt being used in the chain below? Check out the [LangChain documentation](https://github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/map_reduce_prompt.py)\n",
    "\n",
    "For information on the difference between chain types, check out this video on [token limit workarounds](https://youtu.be/f9_BWhCI4Zo)\n",
    "\n",
    "*Note: You could also get fancy and make the first 4 calls of the map_reduce run in parallel too*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "28ddd9c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get your chain ready to use\n",
    "chain = load_summarize_chain(llm=llm, chain_type='map_reduce') # verbose=True optional to see what is getting sent to the LLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "be0b2d04",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " This essay looks at the idea of benevolence in startups, and how it can help them succeed. It explains how benevolence can improve morale, make people want to help, and help startups be decisive. It also looks at how markets have evolved to value potential dividends and potential earnings, and how users dislike their new operating system. The author argues that starting a company with benevolent aims is currently undervalued, and that Y Combinator's motto of \"Make something people want\" is a powerful concept.\n"
     ]
    }
   ],
   "source": [
    "# Use it. This will run through the 4 documents, summarize the chunks, then get a summary of the summary.\n",
    "output = chain.run(docs)\n",
    "print (output)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2d664fc",
   "metadata": {},
   "source": [
    "## Question & Answering Using Documents As Context"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad87c72b",
   "metadata": {},
   "source": [
    "*[LangChain Question & Answer Docs](https://python.langchain.com/en/latest/use_cases/question_answering.html)*\n",
    "\n",
    "In order to use LLMs for question and answer we must:\n",
    "\n",
    "1. Pass the LLM relevant context it needs to answer a question\n",
    "2. Pass it our question that we want answered\n",
    "\n",
    "Simplified, this process looks like this \"llm(your context + your question) = your answer\"\n",
    "\n",
    "* **Deep Dive** - [Question A Book](https://youtu.be/h0DHDp1FbmQ), [Ask Questions To Your Custom Files](https://youtu.be/EnT-ZTrcPrg), [Chat Your Data JS (1000 pages of Financial Reports)](https://www.youtube.com/watch?v=Ix9WIZpArm0&t=1051s), [LangChain Q&A webinar](https://www.crowdcast.io/c/rh66hcwivly0)\n",
    "* **Examples** - [ChatPDF](https://www.chatpdf.com/)\n",
    "* **Use Cases** - Chat your documents, ask questions to academic papers, create study guides, reference medical information"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "685e15f3",
   "metadata": {},
   "source": [
    "### Simple Q&A Example\n",
    "\n",
    "Here let's review the convention of `llm(your context + your question) = your answer`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "9ebd8451",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenAI\n",
    "\n",
    "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "b4795187",
   "metadata": {},
   "outputs": [],
   "source": [
    "context = \"\"\"\n",
    "Rachel is 30 years old\n",
    "Bob is 45 years old\n",
    "Kevin is 65 years old\n",
    "\"\"\"\n",
    "\n",
    "question = \"Who is under 40 years old?\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2184b11b",
   "metadata": {},
   "source": [
    "Then combine them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "0c53650d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Rachel is under 40 years old.\n"
     ]
    }
   ],
   "source": [
    "output = llm(context + question)\n",
    "\n",
    "# I strip the text to remove the leading and trailing whitespace\n",
    "print (output.strip())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "385180ca",
   "metadata": {},
   "source": [
    "As we ramp up our sophistication, we'll take advantage of this convention more.\n",
    "\n",
    "The hard part comes in when you need to be selective about *which* data you put in your context. This field of study is called \"[document retrieval](https://python.langchain.com/en/latest/modules/indexes/retrievers.html)\" and tightly coupled with AI Memory."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53ed4080",
   "metadata": {},
   "source": [
    "### Using Embeddings\n",
    "\n",
    "I informally call what were about to go through as \"The VectorStore Dance\". It's the process of splitting your text, embedding the chunks, putting the embeddings in a DB, and then querying them. For a full video on this check out [How To Question A Book](https://www.youtube.com/watch?v=h0DHDp1FbmQ)\n",
    "\n",
    "The goal is to select relevant chunks of our long text, but which chunks do we pull? The most popular method is to pull *similar* texts based off comparing vector embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "a7a02ccc",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain import OpenAI\n",
    "\n",
    "# The vectorstore we'll be using\n",
    "from langchain.vectorstores import FAISS\n",
    "\n",
    "# The LangChain component we'll use to get the documents\n",
    "from langchain.chains import RetrievalQA\n",
    "\n",
    "# The easy document loader for text\n",
    "from langchain.document_loaders import TextLoader\n",
    "\n",
    "# The embedding engine that will convert our text to vectors\n",
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "\n",
    "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40afcfec",
   "metadata": {},
   "source": [
    "Let's load up a longer document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "5772bc26",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You have 1 document\n",
      "You have 74663 characters in that document\n"
     ]
    }
   ],
   "source": [
    "loader = TextLoader('data/PaulGrahamEssays/worked.txt')\n",
    "doc = loader.load()\n",
    "print (f\"You have {len(doc)} document\")\n",
    "print (f\"You have {len(doc[0].page_content)} characters in that document\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb87424c",
   "metadata": {},
   "source": [
    "Now let's split our long doc into smaller pieces"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "b4a6e452",
   "metadata": {},
   "outputs": [],
   "source": [
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)\n",
    "docs = text_splitter.split_documents(doc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "723e8aec",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Now you have 29 documents that have an average of 2,930 characters (smaller pieces)\n"
     ]
    }
   ],
   "source": [
    "# Get the total number of characters so we can see the average later\n",
    "num_total_characters = sum([len(x.page_content) for x in docs])\n",
    "\n",
    "print (f\"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "9b591198",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get your embeddings engine ready\n",
    "embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)\n",
    "\n",
    "# Embed your documents and combine with the raw text in a pseudo db. Note: This will make an API call to OpenAI\n",
    "docsearch = FAISS.from_documents(docs, embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b13348",
   "metadata": {},
   "source": [
    "Create your retrieval engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "47cd969d",
   "metadata": {},
   "outputs": [],
   "source": [
    "qa = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\", retriever=docsearch.as_retriever())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6aa2963c",
   "metadata": {},
   "source": [
    "Now it's time to ask a question. The retriever will go get the similar documents and combine with your question for the LLM to reason through.\n",
    "\n",
    "Note: It may not seem like much, but the magic here is that we didn't have to pass in our full original document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "6a062c85",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' The author describes painting as good work.'"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"What does the author describe as good work?\"\n",
    "qa.run(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be503d53",
   "metadata": {},
   "source": [
    "If you wanted to do more you would hook this up to a cloud vector database, use a tool like metal and start managing your documents, with external data sources"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3d04dc9",
   "metadata": {},
   "source": [
    "## Extraction\n",
    "*[LangChain Extraction Docs](https://python.langchain.com/en/latest/use_cases/extraction.html)*\n",
    "\n",
    "Extraction is the process of parsing data from a piece of text. This is commonly used with output parsing in order to *structure* our data.\n",
    "\n",
    "* **Deep Dive** - [Use LLMs to Extract Data From Text (Expert Level Text Extraction](https://youtu.be/xZzvwR9jdPA), [Structured Output From OpenAI (Clean Dirty Data)](https://youtu.be/KwAXfey-xQk)\n",
    "* **Examples** - [OpeningAttributes](https://twitter.com/GregKamradt/status/1646500373837008897)\n",
    "* **Use Cases:** Extract a structured row from a sentence to insert into a database, extract multiple rows from a long document to insert into a database, extracting parameters from a user query to make an API call\n",
    "\n",
    "A popular library for extraction is [Kor](https://eyurtsev.github.io/kor/). We won't cover it today but I highly suggest checking it out for advanced extraction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "904d43c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# To help construct our Chat Messages\n",
    "from langchain.schema import HumanMessage\n",
    "from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate\n",
    "\n",
    "# We will be using a chat model, defaults to gpt-3.5-turbo\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "\n",
    "# To parse outputs and get structured data back\n",
    "from langchain.output_parsers import StructuredOutputParser, ResponseSchema\n",
    "\n",
    "chat_model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6923ca8b",
   "metadata": {},
   "source": [
    "### Vanilla Extraction\n",
    "\n",
    "Let's start off with an easy example. Here I simply supply a prompt with instructions with the type of output I want."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "ab1cce97",
   "metadata": {},
   "outputs": [],
   "source": [
    "instructions = \"\"\"\n",
    "You will be given a sentence with fruit names, extract those fruit names and assign an emoji to them\n",
    "Return the fruit name and emojis in a python dictionary\n",
    "\"\"\"\n",
    "\n",
    "fruit_names = \"\"\"\n",
    "Apple, Pear, this is an kiwi\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "38f16ea4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'Apple': '🍎', 'Pear': '🍐', 'kiwi': '🥝'}\n",
      "<class 'str'>\n"
     ]
    }
   ],
   "source": [
    "# Make your prompt which combines the instructions w/ the fruit names\n",
    "prompt = (instructions + fruit_names)\n",
    "\n",
    "# Call the LLM\n",
    "output = chat_model([HumanMessage(content=prompt)])\n",
    "\n",
    "print (output.content)\n",
    "print (type(output.content))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39d6cff3",
   "metadata": {},
   "source": [
    "Let's turn this into a proper python dictionary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "314286b4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'Apple': '🍎', 'Pear': '🍐', 'kiwi': '🥝'}\n",
      "<class 'dict'>\n"
     ]
    }
   ],
   "source": [
    "output_dict = eval(output.content)\n",
    "\n",
    "print (output_dict)\n",
    "print (type(output_dict))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3909eb29",
   "metadata": {},
   "source": [
    "While this worked this time, it's not a long term reliable method for more advanced use cases"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6a0a90d",
   "metadata": {},
   "source": [
    "### Using LangChain's Response Schema\n",
    "\n",
    "LangChain's response schema will does two things for us: \n",
    "\n",
    "1. Autogenerate the a prompt with bonafide format instructions. This is great because I don't need to worry about the prompt engineering side, I'll leave that up to LangChain!\n",
    "\n",
    "2. Read the output from the LLM and turn it into a proper python object for me\n",
    "\n",
    "Here I define the schema I want. I'm going to pull out the song and artist that a user wants to play from a pseudo chat message."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "dc2ba0be",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The schema I want out\n",
    "response_schemas = [\n",
    "    ResponseSchema(name=\"artist\", description=\"The name of the musical artist\"),\n",
    "    ResponseSchema(name=\"song\", description=\"The name of the song that the artist plays\")\n",
    "]\n",
    "\n",
    "# The parser that will look for the LLM output in my schema and return it back to me\n",
    "output_parser = StructuredOutputParser.from_response_schemas(response_schemas)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "f9e3c6cf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The output should be a markdown code snippet formatted in the following schema, including the leading and trailing \"\\`\\`\\`json\" and \"\\`\\`\\`\":\n",
      "\n",
      "```json\n",
      "{\n",
      "\t\"artist\": string  // The name of the musical artist\n",
      "\t\"song\": string  // The name of the song that the artist plays\n",
      "}\n",
      "```\n"
     ]
    }
   ],
   "source": [
    "# The format instructions that LangChain makes. Let's look at them\n",
    "format_instructions = output_parser.get_format_instructions()\n",
    "print(format_instructions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "d702900c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The prompt template that brings it all together\n",
    "# Note: This is a different prompt template than before because we are using a Chat Model\n",
    "\n",
    "prompt = ChatPromptTemplate(\n",
    "    messages=[\n",
    "        HumanMessagePromptTemplate.from_template(\"Given a command from the user, extract the artist and song names \\n \\\n",
    "                                                    {format_instructions}\\n{user_prompt}\")  \n",
    "    ],\n",
    "    input_variables=[\"user_prompt\"],\n",
    "    partial_variables={\"format_instructions\": format_instructions}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "bb6adde9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Given a command from the user, extract the artist and song names \n",
      "                                                     The output should be a markdown code snippet formatted in the following schema, including the leading and trailing \"\\`\\`\\`json\" and \"\\`\\`\\`\":\n",
      "\n",
      "```json\n",
      "{\n",
      "\t\"artist\": string  // The name of the musical artist\n",
      "\t\"song\": string  // The name of the song that the artist plays\n",
      "}\n",
      "```\n",
      "I really like So Young by Portugal. The Man\n"
     ]
    }
   ],
   "source": [
    "fruit_query = prompt.format_prompt(user_prompt=\"I really like So Young by Portugal. The Man\")\n",
    "print (fruit_query.messages[0].content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "b8664302",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'artist': 'Portugal. The Man', 'song': 'So Young'}\n",
      "<class 'dict'>\n"
     ]
    }
   ],
   "source": [
    "fruit_output = chat_model(fruit_query.to_messages())\n",
    "output = output_parser.parse(fruit_output.content)\n",
    "\n",
    "print (output)\n",
    "print (type(output))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b68b8eeb",
   "metadata": {},
   "source": [
    "Awesome, now we have a dictionary that we can use later down the line\n",
    "\n",
    "<span style=\"background:#fff5d6\">Warning:</span> The parser looks for an output from the LLM in a specific format. Your model may not output the same format every time. Make sure to handle errors with this one. GPT4 and future iterations will be more reliable.\n",
    "\n",
    "For more advanced parsing check out [Kor](https://eyurtsev.github.io/kor/)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2fb4ba6",
   "metadata": {},
   "source": [
    "## Evaluation\n",
    "\n",
    "*[LangChain Evaluation Docs](https://python.langchain.com/en/latest/use_cases/evaluation.html)*\n",
    "\n",
    "Evaluation is the process of doing quality checks on the output of your applications. Normal, deterministic, code has tests we can run, but judging the output of LLMs is more difficult because of the unpredictableness and variability of natural language. LangChain provides tools that aid us in this journey.\n",
    "\n",
    "* **Deep Dive** - Coming Soon\n",
    "* **Examples** - [Lance Martin's Advanced](https://twitter.com/RLanceMartin) [Auto-Evaluator](https://github.com/rlancemartin/auto-evaluator)\n",
    "* **Use Cases:** Run quality checks on your summarization or Question & Answer pipelines, check the output of you summarization pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "9fbaa6e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Embeddings, store, and retrieval\n",
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.vectorstores import FAISS\n",
    "from langchain.chains import RetrievalQA\n",
    "\n",
    "# Model and doc loader\n",
    "from langchain import OpenAI\n",
    "from langchain.document_loaders import TextLoader\n",
    "\n",
    "# Eval!\n",
    "from langchain.evaluation.qa import QAEvalChain\n",
    "\n",
    "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "9f35fa12",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You have 1 document\n",
      "You have 74663 characters in that document\n"
     ]
    }
   ],
   "source": [
    "# Our long essay from before\n",
    "loader = TextLoader('data/PaulGrahamEssays/worked.txt')\n",
    "doc = loader.load()\n",
    "\n",
    "print (f\"You have {len(doc)} document\")\n",
    "print (f\"You have {len(doc[0].page_content)} characters in that document\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7acca7da",
   "metadata": {},
   "source": [
    "First let's do the Vectorestore dance so we can do question and answers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "1955faef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Now you have 29 documents that have an average of 2,930 characters (smaller pieces)\n"
     ]
    }
   ],
   "source": [
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)\n",
    "docs = text_splitter.split_documents(doc)\n",
    "\n",
    "# Get the total number of characters so we can see the average later\n",
    "num_total_characters = sum([len(x.page_content) for x in docs])\n",
    "\n",
    "print (f\"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "890b85ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Embeddings and docstore\n",
    "embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)\n",
    "docsearch = FAISS.from_documents(docs, embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a0d6e25",
   "metadata": {},
   "source": [
    "Make your retrieval chain. Notice how I have an `input_key` parameter now. This tells the chain which key from a dictionary I supply has my prompt/query in it. I specify `question` to match the question in the dict below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "ddb3f3b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\", retriever=docsearch.as_retriever(), input_key=\"question\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b37dd0cd",
   "metadata": {},
   "source": [
    "Now I'll pass a list of questions and ground truth answers to the LLM that I know are correct (I validated them as a human)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "d93d08bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "question_answers = [\n",
    "    {'question' : \"Which company sold the microcomputer kit that his friend built himself?\", 'answer' : 'Healthkit'},\n",
    "    {'question' : \"What was the small city he talked about in the city that is the financial capital of USA?\", 'answer' : 'Yorkville, NY'}\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98c4b591",
   "metadata": {},
   "source": [
    "I'll use `chain.apply` to run both my questions one by one separately.\n",
    "\n",
    "One of the cool parts is that I'll get my list of question and answers dictionaries back, but there'll be another key in the dictionary `result` which will be the output from the LLM.\n",
    "\n",
    "Note: I specifically made my 2nd question ambigious and tough to answer in one pass so the LLM would get it incorrect"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "a4a4e041",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'question': 'Which company sold the microcomputer kit that his friend built himself?',\n",
       "  'answer': 'Healthkit',\n",
       "  'result': ' The microcomputer kit was sold by Heathkit.'},\n",
       " {'question': 'What was the small city he talked about in the city that is the financial capital of USA?',\n",
       "  'answer': 'Yorkville, NY',\n",
       "  'result': ' The small city he talked about is New York City, which is the financial capital of the United States.'}]"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predictions = chain.apply(question_answers)\n",
    "predictions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed1226c9",
   "metadata": {},
   "source": [
    "We then have the LLM compare my ground truth answer (the `answer` key) with the result from the LLM (`result` key).\n",
    "\n",
    "Or simply, we are asking the LLM to grade itself. What a wild world we live in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "ae119b18",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Start your eval chain\n",
    "eval_chain = QAEvalChain.from_llm(llm)\n",
    "\n",
    "# Have it grade itself. The code below helps the eval_chain know where the different parts are\n",
    "graded_outputs = eval_chain.evaluate(question_answers,\n",
    "                                     predictions,\n",
    "                                     question_key=\"question\",\n",
    "                                     prediction_key=\"result\",\n",
    "                                     answer_key='answer')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "c2882750",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'text': ' CORRECT'}, {'text': ' INCORRECT'}]"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "graded_outputs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b30268b",
   "metadata": {},
   "source": [
    "This is correct! Notice how the answer in question #1 was \"Healthkit\" and the prediction was \"The microcomputer kit was sold by Heathkit.\" The LLM knew that the answer and result were the same and gave us a \"correct\" label. Awesome.\n",
    "\n",
    "For #2 it knew they were not the same and gave us an \"incorrect\" label"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2745752",
   "metadata": {},
   "source": [
    "## Querying Tabular Data\n",
    "\n",
    "*[LangChain Querying Tabular Data Docs](https://python.langchain.com/en/latest/use_cases/tabular.html)*\n",
    "\n",
    "The most common type of data in the world sits in tabular form (ok, ok, besides unstructured data). It is super powerful to be able to query this data with LangChain and pass it through to an LLM \n",
    "\n",
    "* **Deep Dive** - Coming Soon\n",
    "* **Examples** - TBD\n",
    "* **Use Cases:** Use LLMs to query data about users, do data analysis, get real time information from your DBs\n",
    "\n",
    "For futher reading check out \"Agents + Tabular Data\" ([Pandas](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/pandas.html), [SQL](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/sql_database.html), [CSV](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/csv.html))\n",
    "\n",
    "Let's query an SQLite DB with natural language. We'll look at the [San Francisco Trees](https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq) dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "9b19c2d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain import OpenAI, SQLDatabase, SQLDatabaseChain\n",
    "\n",
    "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "294a4e7f",
   "metadata": {},
   "source": [
    "We'll start off by specifying where our data is and get the connection ready"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "6044d54e",
   "metadata": {},
   "outputs": [],
   "source": [
    "sqlite_db_path = 'data/San_Francisco_Trees.db'\n",
    "db = SQLDatabase.from_uri(f\"sqlite:///{sqlite_db_path}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "203eedd4",
   "metadata": {},
   "source": [
    "Then we'll create a chain that take our LLM, and DB. I'm setting `verbose=True` so you can see what is happening underneath the hood."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "dccf0957",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/gregorykamradt/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/sql_database/base.py:63: UserWarning: Directly instantiating an SQLDatabaseChain with an llm is deprecated. Please instantiate with llm_chain argument or using the from_llm class method.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "99cdbc44",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
      "How many Species of trees are there in San Francisco?\n",
      "SQLQuery:\u001b[32;1m\u001b[1;3mSELECT COUNT(DISTINCT \"qSpecies\") FROM \"SFTrees\";\u001b[0m\n",
      "SQLResult: \u001b[33;1m\u001b[1;3m[(578,)]\u001b[0m\n",
      "Answer:\u001b[32;1m\u001b[1;3mThere are 578 Species of trees in San Francisco.\u001b[0m\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'There are 578 Species of trees in San Francisco.'"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db_chain.run(\"How many Species of trees are there in San Francisco?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bd61598",
   "metadata": {},
   "source": [
    "This is awesome! There are actually a few steps going on here.\n",
    "\n",
    "**Steps:**\n",
    "1. Find which table to use\n",
    "2. Find which column to use\n",
    "3. Construct the correct sql query\n",
    "4. Execute that query\n",
    "5. Get the result\n",
    "6. Return a natural language reponse back\n",
    "\n",
    "Let's confirm via pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "299ff6ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sqlite3\n",
    "import pandas as pd\n",
    "\n",
    "# Connect to the SQLite database\n",
    "connection = sqlite3.connect(sqlite_db_path)\n",
    "\n",
    "# Define your SQL query\n",
    "query = \"SELECT count(distinct qSpecies) FROM SFTrees\"\n",
    "\n",
    "# Read the SQL query into a Pandas DataFrame\n",
    "df = pd.read_sql_query(query, connection)\n",
    "\n",
    "# Close the connection\n",
    "connection.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "f1b2dd89",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "578\n"
     ]
    }
   ],
   "source": [
    "# Display the result in the first column first cell\n",
    "print(df.iloc[0,0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c48b5a42",
   "metadata": {},
   "source": [
    "Nice! The answers match."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04293535",
   "metadata": {},
   "source": [
    "## Code Understanding\n",
    "\n",
    "*[LangChain Code Understanding Docs](https://python.langchain.com/en/latest/use_cases/code.html)*\n",
    "\n",
    "One of the most exciting abilities of LLMs is code undestanding. People around the world are leveling up their output in both speed & quality due to AI help. A big part of this is having a LLM that can understand code and help you with a particular task.\n",
    "\n",
    "* **Deep Dive** - Coming Soon\n",
    "* **Examples** - TBD\n",
    "* **Use Cases:** Co-Pilot-esque functionality that can help answer questions from a specific library, help you generate new code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "f3101c11",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper to read local files\n",
    "import os\n",
    "\n",
    "# Vector Support\n",
    "from langchain.vectorstores import FAISS\n",
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "\n",
    "# Model and chain\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "\n",
    "# Text splitters\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.document_loaders import TextLoader\n",
    "\n",
    "llm = ChatOpenAI(model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c4dfd6d",
   "metadata": {},
   "source": [
    "We will do the Vectorstore dance again"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "8a9247e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "embeddings = OpenAIEmbeddings(disallowed_special=(), openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf12eb2c",
   "metadata": {},
   "source": [
    "I put a small python package [The Fuzz](https://github.com/seatgeek/thefuzz) (personal indie favorite) in the data folder of this repo.\n",
    "\n",
    "The loop below will go through each file in the library and load it up as a doc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "bd3973a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "root_dir = 'data/thefuzz'\n",
    "docs = []\n",
    "\n",
    "# Go through each folder\n",
    "for dirpath, dirnames, filenames in os.walk(root_dir):\n",
    "    \n",
    "    # Go through each file\n",
    "    for file in filenames:\n",
    "        try: \n",
    "            # Load up the file as a doc and split\n",
    "            loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')\n",
    "            docs.extend(loader.load_and_split())\n",
    "        except Exception as e: \n",
    "            pass"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "136cae5e",
   "metadata": {},
   "source": [
    "Let's look at an example of a document. It's just code!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "85a39161",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You have 175 documents\n",
      "\n",
      "------ Start Document ------\n",
      "import unittest\n",
      "import re\n",
      "import pycodestyle\n",
      "\n",
      "from thefuzz import fuzz\n",
      "from thefuzz import process\n",
      "from thefuzz import utils\n",
      "from thefuzz.string_processing import StringProcessor\n",
      "\n",
      "\n",
      "class StringProcessingTest(unittest.TestCase):\n",
      "    def test_replace_non_letters_non_numbers_with_whitespace(self):\n",
      "    \n"
     ]
    }
   ],
   "source": [
    "print (f\"You have {len(docs)} documents\\n\")\n",
    "print (\"------ Start Document ------\")\n",
    "print (docs[0].page_content[:300])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02634791",
   "metadata": {},
   "source": [
    "Embed and store them in a docstore. This will make an API call to OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "94427072",
   "metadata": {},
   "outputs": [],
   "source": [
    "docsearch = FAISS.from_documents(docs, embeddings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "2071f3c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get our retriever ready\n",
    "qa = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\", retriever=docsearch.as_retriever())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "0536b828",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"What function do I use if I want to find the most similar item in a list of items?\"\n",
    "output = qa.run(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "b9074fb6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You can use the `process.extractOne()` function from `thefuzz` package to find the most similar item in a list of items. Here's an example:\n",
      "\n",
      "```\n",
      "from thefuzz import process\n",
      "\n",
      "choices = [\"apple\", \"banana\", \"orange\", \"pear\"]\n",
      "query = \"pineapple\"\n",
      "\n",
      "best_match = process.extractOne(query, choices)\n",
      "print(best_match)\n",
      "```\n",
      "\n",
      "This would output `(u'apple', 36)`, which means that the most similar item to \"pineapple\" in the list of choices is \"apple\", with a similarity score of 36.\n"
     ]
    }
   ],
   "source": [
    "print (output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "f53860e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"Can you write the code to use the process.extractOne() function? Only respond with code. No other text or explanation\"\n",
    "output = qa.run(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "27e56a5d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "import fuzzywuzzy.process as process\n",
      "\n",
      "choices = [\n",
      "    \"new york mets vs chicago cubs\",\n",
      "    \"chicago cubs at new york mets\",\n",
      "    \"atlanta braves vs pittsbugh pirates\",\n",
      "    \"new york yankees vs boston red sox\"\n",
      "]\n",
      "\n",
      "query = \"new york mets at chicago cubs\"\n",
      "\n",
      "best = process.extractOne(query, choices)\n",
      "print(best[0])\n"
     ]
    }
   ],
   "source": [
    "print (output)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7deae05",
   "metadata": {},
   "source": [
    "[¡Shibby!](https://thumbs.gfycat.com/WateryBeneficialDeermouse-size_restricted.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3b2783a",
   "metadata": {},
   "source": [
    "## Interacting with APIs\n",
    "\n",
    "*[LangChain API Interaction Docs](https://python.langchain.com/en/latest/use_cases/apis.html)*\n",
    "\n",
    "If the data or action you need is behind an API, you'll need your LLM to interact with APIs\n",
    "\n",
    "* **Deep Dive** - Coming Soon\n",
    "* **Examples** - TBD\n",
    "* **Use Cases:** Understand a request from a user and carry out an action, be able to automate more real-world workflows\n",
    "\n",
    "This topic is closely related to Agents and Plugins, though we'll look at a simple use case for this section. For more information, check out [LangChain + plugins](https://python.langchain.com/en/latest/use_cases/agents/custom_agent_with_plugin_retrieval_using_plugnplai.html) documentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "352685c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chains import APIChain\n",
    "from langchain.llms import OpenAI\n",
    "\n",
    "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6b834fe",
   "metadata": {},
   "source": [
    "LangChain's APIChain has the ability to read API documentation and understand which endpoint it needs to call.\n",
    "\n",
    "In this case I wrote (purposefully sloppy) API documentation to demonstrate how this works"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "3ff4b986",
   "metadata": {},
   "outputs": [],
   "source": [
    "api_docs = \"\"\"\n",
    "\n",
    "BASE URL: https://restcountries.com/\n",
    "\n",
    "API Documentation:\n",
    "\n",
    "The API endpoint /v3.1/name/{name} Used to find informatin about a country. All URL parameters are listed below:\n",
    "    - name: Name of country - Ex: italy, france\n",
    "    \n",
    "The API endpoint /v3.1/currency/{currency} Uesd to find information about a region. All URL parameters are listed below:\n",
    "    - currency: 3 letter currency. Example: USD, COP\n",
    "    \n",
    "Woo! This is my documentation\n",
    "\"\"\"\n",
    "\n",
    "chain_new = APIChain.from_llm_and_api_docs(llm, api_docs, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "221aa3a6",
   "metadata": {},
   "source": [
    "Let's try to make an API call that is meant for the country endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "e6d9cae4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new APIChain chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m https://restcountries.com/v3.1/name/france\u001b[0m\n",
      "\u001b[33;1m\u001b[1;3m[{\"name\":{\"common\":\"France\",\"official\":\"French Republic\",\"nativeName\":{\"fra\":{\"official\":\"République française\",\"common\":\"France\"}}},\"tld\":[\".fr\"],\"cca2\":\"FR\",\"ccn3\":\"250\",\"cca3\":\"FRA\",\"cioc\":\"FRA\",\"independent\":true,\"status\":\"officially-assigned\",\"unMember\":true,\"currencies\":{\"EUR\":{\"name\":\"Euro\",\"symbol\":\"€\"}},\"idd\":{\"root\":\"+3\",\"suffixes\":[\"3\"]},\"capital\":[\"Paris\"],\"altSpellings\":[\"FR\",\"French Republic\",\"République française\"],\"region\":\"Europe\",\"subregion\":\"Western Europe\",\"languages\":{\"fra\":\"French\"},\"translations\":{\"ara\":{\"official\":\"الجمهورية الفرنسية\",\"common\":\"فرنسا\"},\"bre\":{\"official\":\"Republik Frañs\",\"common\":\"Frañs\"},\"ces\":{\"official\":\"Francouzská republika\",\"common\":\"Francie\"},\"cym\":{\"official\":\"French Republic\",\"common\":\"France\"},\"deu\":{\"official\":\"Französische Republik\",\"common\":\"Frankreich\"},\"est\":{\"official\":\"Prantsuse Vabariik\",\"common\":\"Prantsusmaa\"},\"fin\":{\"official\":\"Ranskan tasavalta\",\"common\":\"Ranska\"},\"fra\":{\"official\":\"République française\",\"common\":\"France\"},\"hrv\":{\"official\":\"Francuska Republika\",\"common\":\"Francuska\"},\"hun\":{\"official\":\"Francia Köztársaság\",\"common\":\"Franciaország\"},\"ita\":{\"official\":\"Repubblica francese\",\"common\":\"Francia\"},\"jpn\":{\"official\":\"フランス共和国\",\"common\":\"フランス\"},\"kor\":{\"official\":\"프랑스 공화국\",\"common\":\"프랑스\"},\"nld\":{\"official\":\"Franse Republiek\",\"common\":\"Frankrijk\"},\"per\":{\"official\":\"جمهوری فرانسه\",\"common\":\"فرانسه\"},\"pol\":{\"official\":\"Republika Francuska\",\"common\":\"Francja\"},\"por\":{\"official\":\"República Francesa\",\"common\":\"França\"},\"rus\":{\"official\":\"Французская Республика\",\"common\":\"Франция\"},\"slk\":{\"official\":\"Francúzska republika\",\"common\":\"Francúzsko\"},\"spa\":{\"official\":\"República francés\",\"common\":\"Francia\"},\"srp\":{\"official\":\"Француска Република\",\"common\":\"Француска\"},\"swe\":{\"official\":\"Republiken Frankrike\",\"common\":\"Frankrike\"},\"tur\":{\"official\":\"Fransa Cumhuriyeti\",\"common\":\"Fransa\"},\"urd\":{\"official\":\"جمہوریہ فرانس\",\"common\":\"فرانس\"},\"zho\":{\"official\":\"法兰西共和国\",\"common\":\"法国\"}},\"latlng\":[46.0,2.0],\"landlocked\":false,\"borders\":[\"AND\",\"BEL\",\"DEU\",\"ITA\",\"LUX\",\"MCO\",\"ESP\",\"CHE\"],\"area\":551695.0,\"demonyms\":{\"eng\":{\"f\":\"French\",\"m\":\"French\"},\"fra\":{\"f\":\"Française\",\"m\":\"Français\"}},\"flag\":\"\\uD83C\\uDDEB\\uD83C\\uDDF7\",\"maps\":{\"googleMaps\":\"https://goo.gl/maps/g7QxxSFsWyTPKuzd7\",\"openStreetMaps\":\"https://www.openstreetmap.org/relation/1403916\"},\"population\":67391582,\"gini\":{\"2018\":32.4},\"fifa\":\"FRA\",\"car\":{\"signs\":[\"F\"],\"side\":\"right\"},\"timezones\":[\"UTC-10:00\",\"UTC-09:30\",\"UTC-09:00\",\"UTC-08:00\",\"UTC-04:00\",\"UTC-03:00\",\"UTC+01:00\",\"UTC+02:00\",\"UTC+03:00\",\"UTC+04:00\",\"UTC+05:00\",\"UTC+10:00\",\"UTC+11:00\",\"UTC+12:00\"],\"continents\":[\"Europe\"],\"flags\":{\"png\":\"https://flagcdn.com/w320/fr.png\",\"svg\":\"https://flagcdn.com/fr.svg\",\"alt\":\"The flag of France is composed of three equal vertical bands of blue, white and red.\"},\"coatOfArms\":{\"png\":\"https://mainfacts.com/media/images/coats_of_arms/fr.png\",\"svg\":\"https://mainfacts.com/media/images/coats_of_arms/fr.svg\"},\"startOfWeek\":\"monday\",\"capitalInfo\":{\"latlng\":[48.87,2.33]},\"postalCode\":{\"format\":\"#####\",\"regex\":\"^(\\\\d{5})$\"}}]\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "' France is an officially-assigned, independent country located in Western Europe. Its capital is Paris and its official language is French. Its currency is the Euro (€). It has a population of 67,391,582 and its borders are with Andorra, Belgium, Germany, Italy, Luxembourg, Monaco, Spain, and Switzerland.'"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain_new.run('Can you tell me information about france?')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09235fc3",
   "metadata": {},
   "source": [
    "Let's try to make an API call that is meant for the currency endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "c2735073",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new APIChain chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m https://restcountries.com/v3.1/currency/COP\u001b[0m\n",
      "\u001b[33;1m\u001b[1;3m[{\"name\":{\"common\":\"Colombia\",\"official\":\"Republic of Colombia\",\"nativeName\":{\"spa\":{\"official\":\"República de Colombia\",\"common\":\"Colombia\"}}},\"tld\":[\".co\"],\"cca2\":\"CO\",\"ccn3\":\"170\",\"cca3\":\"COL\",\"cioc\":\"COL\",\"independent\":true,\"status\":\"officially-assigned\",\"unMember\":true,\"currencies\":{\"COP\":{\"name\":\"Colombian peso\",\"symbol\":\"$\"}},\"idd\":{\"root\":\"+5\",\"suffixes\":[\"7\"]},\"capital\":[\"Bogotá\"],\"altSpellings\":[\"CO\",\"Republic of Colombia\",\"República de Colombia\"],\"region\":\"Americas\",\"subregion\":\"South America\",\"languages\":{\"spa\":\"Spanish\"},\"translations\":{\"ara\":{\"official\":\"جمهورية كولومبيا\",\"common\":\"كولومبيا\"},\"bre\":{\"official\":\"Republik Kolombia\",\"common\":\"Kolombia\"},\"ces\":{\"official\":\"Kolumbijská republika\",\"common\":\"Kolumbie\"},\"cym\":{\"official\":\"Gweriniaeth Colombia\",\"common\":\"Colombia\"},\"deu\":{\"official\":\"Republik Kolumbien\",\"common\":\"Kolumbien\"},\"est\":{\"official\":\"Colombia Vabariik\",\"common\":\"Colombia\"},\"fin\":{\"official\":\"Kolumbian tasavalta\",\"common\":\"Kolumbia\"},\"fra\":{\"official\":\"République de Colombie\",\"common\":\"Colombie\"},\"hrv\":{\"official\":\"Republika Kolumbija\",\"common\":\"Kolumbija\"},\"hun\":{\"official\":\"Kolumbiai Köztársaság\",\"common\":\"Kolumbia\"},\"ita\":{\"official\":\"Repubblica di Colombia\",\"common\":\"Colombia\"},\"jpn\":{\"official\":\"コロンビア共和国\",\"common\":\"コロンビア\"},\"kor\":{\"official\":\"콜롬비아 공화국\",\"common\":\"콜롬비아\"},\"nld\":{\"official\":\"Republiek Colombia\",\"common\":\"Colombia\"},\"per\":{\"official\":\"جمهوری کلمبیا\",\"common\":\"کلمبیا\"},\"pol\":{\"official\":\"Republika Kolumbii\",\"common\":\"Kolumbia\"},\"por\":{\"official\":\"República da Colômbia\",\"common\":\"Colômbia\"},\"rus\":{\"official\":\"Республика Колумбия\",\"common\":\"Колумбия\"},\"slk\":{\"official\":\"Kolumbijská republika\",\"common\":\"Kolumbia\"},\"spa\":{\"official\":\"República de Colombia\",\"common\":\"Colombia\"},\"srp\":{\"official\":\"Република Колумбија\",\"common\":\"Колумбија\"},\"swe\":{\"official\":\"Republiken Colombia\",\"common\":\"Colombia\"},\"tur\":{\"official\":\"Kolombiya Cumhuriyeti\",\"common\":\"Kolombiya\"},\"urd\":{\"official\":\"جمہوریہ کولمبیا\",\"common\":\"کولمبیا\"},\"zho\":{\"official\":\"哥伦比亚共和国\",\"common\":\"哥伦比亚\"}},\"latlng\":[4.0,-72.0],\"landlocked\":false,\"borders\":[\"BRA\",\"ECU\",\"PAN\",\"PER\",\"VEN\"],\"area\":1141748.0,\"demonyms\":{\"eng\":{\"f\":\"Colombian\",\"m\":\"Colombian\"},\"fra\":{\"f\":\"Colombienne\",\"m\":\"Colombien\"}},\"flag\":\"\\uD83C\\uDDE8\\uD83C\\uDDF4\",\"maps\":{\"googleMaps\":\"https://goo.gl/maps/RdwTG8e7gPwS62oR6\",\"openStreetMaps\":\"https://www.openstreetmap.org/relation/120027\"},\"population\":50882884,\"gini\":{\"2019\":51.3},\"fifa\":\"COL\",\"car\":{\"signs\":[\"CO\"],\"side\":\"right\"},\"timezones\":[\"UTC-05:00\"],\"continents\":[\"South America\"],\"flags\":{\"png\":\"https://flagcdn.com/w320/co.png\",\"svg\":\"https://flagcdn.com/co.svg\",\"alt\":\"The flag of Colombia is composed of three horizontal bands of yellow, blue and red, with the yellow band twice the height of the other two bands.\"},\"coatOfArms\":{\"png\":\"https://mainfacts.com/media/images/coats_of_arms/co.png\",\"svg\":\"https://mainfacts.com/media/images/coats_of_arms/co.svg\"},\"startOfWeek\":\"monday\",\"capitalInfo\":{\"latlng\":[4.71,-74.07]}}]\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "' The currency of Colombia is the Colombian peso (COP), symbolized by the \"$\" sign.'"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain_new.run('Can you tell me about the currency COP?')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d5be7e0",
   "metadata": {},
   "source": [
    "In both cases the APIChain read the instructions and understood which API call it needed to make.\n",
    "\n",
    "Once the response returned, it was parsed and then my question was answered. Awesome 🐒"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90e0f275",
   "metadata": {},
   "source": [
    "## Chatbots\n",
    "\n",
    "*[LangChain Chatbot Docs](https://python.langchain.com/en/latest/use_cases/chatbots.html)*\n",
    "\n",
    "Chatbots use many of the tools we've already looked at with the addition of an important topic: Memory. There are a ton of different [types of memory](https://python.langchain.com/en/latest/modules/memory/how_to_guides.html), tinker to see which is best for you.\n",
    "\n",
    "* **Deep Dive** - Coming Soon\n",
    "* **Examples** - [ChatBase](https://www.chatbase.co/?via=greg) (Affiliate link), [NexusGPT](https://twitter.com/achammah1/status/1649482899253501958?s=20), [ChatPDF](https://www.chatpdf.com/)\n",
    "* **Use Cases:** Have a real time interaction with a user, provide an approachable UI for users to ask natural language questions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "7dca0672",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenAI\n",
    "from langchain import LLMChain\n",
    "from langchain.prompts.prompt import PromptTemplate\n",
    "\n",
    "# Chat specific components\n",
    "from langchain.memory import ConversationBufferMemory"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53b86e88",
   "metadata": {},
   "source": [
    "For this use case I'm going to show you how to customize the context that is given to a chatbot.\n",
    "\n",
    "You could pass instructions on how the bot should respond, but also any additional relevant information it needs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "547aefa1",
   "metadata": {},
   "outputs": [],
   "source": [
    "template = \"\"\"\n",
    "You are a chatbot that is unhelpful.\n",
    "Your goal is to not help the user but only make jokes.\n",
    "Take what the user is saying and make a joke out of it\n",
    "\n",
    "{chat_history}\n",
    "Human: {human_input}\n",
    "Chatbot:\"\"\"\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    input_variables=[\"chat_history\", \"human_input\"], \n",
    "    template=template\n",
    ")\n",
    "memory = ConversationBufferMemory(memory_key=\"chat_history\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "475822a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm_chain = LLMChain(\n",
    "    llm=OpenAI(openai_api_key=openai_api_key), \n",
    "    prompt=prompt, \n",
    "    verbose=True, \n",
    "    memory=memory\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "20ae6e3d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3m\n",
      "You are a chatbot that is unhelpful.\n",
      "Your goal is to not help the user but only make jokes.\n",
      "Take what the user is saying and make a joke out of it\n",
      "\n",
      "\n",
      "Human: Is an pear a fruit or vegetable?\n",
      "Chatbot:\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "' Yes, an pear is a fruit of confusion!'"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "llm_chain.predict(human_input=\"Is an pear a fruit or vegetable?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "bd87e2a9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3m\n",
      "You are a chatbot that is unhelpful.\n",
      "Your goal is to not help the user but only make jokes.\n",
      "Take what the user is saying and make a joke out of it\n",
      "\n",
      "Human: Is an pear a fruit or vegetable?\n",
      "AI:  Yes, an pear is a fruit of confusion!\n",
      "Human: What was one of the fruits I first asked you about?\n",
      "Chatbot:\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "' I think it was the fruit of knowledge!'"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "llm_chain.predict(human_input=\"What was one of the fruits I first asked you about?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8db86471",
   "metadata": {},
   "source": [
    "Notice how my 1st interaction was put into the prompt of my 2nd interaction. This is the memory piece at work.\n",
    "\n",
    "There are many ways to structure a conversation, check out the different ways on the [docs](https://python.langchain.com/en/latest/use_cases/chatbots.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "144e0d09",
   "metadata": {},
   "source": [
    "## Agents\n",
    "\n",
    "*[LangChain Agent Docs](https://python.langchain.com/en/latest/modules/agents.html)*\n",
    "\n",
    "Agents are one of the hottest [🔥](https://media.tenor.com/IH7C6xNbkuoAAAAC/so-hot-right-now-trending.gif) topics in LLMs. Agents are the decision makers that can look a data, reason about what the next action should be, and execute that action for you via tools\n",
    "\n",
    "* **Deep Dive** - [Introduction to agents](https://youtu.be/2xxziIWmaSA?t=1972), [LangChain Agents Webinar](https://www.crowdcast.io/c/46erbpbz609r), much deeper dive coming soon\n",
    "* **Examples** - TBD\n",
    "* **Use Cases:** Run programs autonomously without the need for human input\n",
    "\n",
    "Examples of advanced uses of agents appear in [BabyAGI](https://github.com/yoheinakajima/babyagi) and [AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "df6d2853",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helpers\n",
    "import os\n",
    "import json\n",
    "\n",
    "from langchain.llms import OpenAI\n",
    "\n",
    "# Agent imports\n",
    "from langchain.agents import load_tools\n",
    "from langchain.agents import initialize_agent\n",
    "\n",
    "# Tool imports\n",
    "from langchain.agents import Tool\n",
    "from langchain.utilities import GoogleSearchAPIWrapper\n",
    "from langchain.utilities import TextRequestsWrapper"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47e7ab35",
   "metadata": {},
   "source": [
    "For this example I'm going to pull google search results. You may want to do this if you need a list of websites for a research project.\n",
    "\n",
    "You can sign up for both of these keys at the urls below\n",
    "\n",
    "[GOOGLE_API_KEY](https://console.cloud.google.com/apis/credentials)\n",
    "[GOOGLE_CSE_ID](https://programmablesearchengine.google.com/controlpanel/create)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "c5fb5850",
   "metadata": {},
   "outputs": [],
   "source": [
    "GOOGLE_CSE_ID = os.getenv('GOOGLE_CSE_ID', 'YourAPIKeyIfNotSet')\n",
    "GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY', 'YourAPIKeyIfNotSet')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "ef374dfa",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3235ccc",
   "metadata": {},
   "source": [
    "Initialize both the tools you'll be using. For this example we'll search google and also give the LLM the ability to execute python code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "55903997",
   "metadata": {},
   "outputs": [],
   "source": [
    "search = GoogleSearchAPIWrapper(google_api_key=GOOGLE_API_KEY, google_cse_id=GOOGLE_CSE_ID)\n",
    "\n",
    "requests = TextRequestsWrapper()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7859aed9",
   "metadata": {},
   "source": [
    "Put both your tools in a toolkit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "7e60591c",
   "metadata": {},
   "outputs": [],
   "source": [
    "toolkit = [\n",
    "    Tool(\n",
    "        name = \"Search\",\n",
    "        func=search.run,\n",
    "        description=\"useful for when you need to search google to answer questions about current events\"\n",
    "    ),\n",
    "    Tool(\n",
    "        name = \"Requests\",\n",
    "        func=requests.get,\n",
    "        description=\"Useful for when you to make a request to a URL\"\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21f7c19e",
   "metadata": {},
   "source": [
    "Create your agent by giving it the tools, LLM and the type of agent that it should be"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "1d4ad2ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = initialize_agent(toolkit, llm, agent=\"zero-shot-react-description\", verbose=True, return_intermediate_steps=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da11d1dc",
   "metadata": {},
   "source": [
    "Now ask it a question, I'm going to give it one that it should go to Google for"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "b027ee69",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m I need to find out what the capital of Canada is.\n",
      "Action: Search\n",
      "Action Input: \"capital of Canada\"\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3mLooking to build credit or earn rewards? Compare our rewards, Guaranteed secured and other Guaranteed credit cards. Canada's capital is Ottawa and its three largest metropolitan areas are Toronto, Montreal, and Vancouver. Canada. A vertical triband design (red, white, red) ... Browse available job openings at Capital One - CA. ... Together, we will build one of Canada's leading information-based technology companies – join us, ... Ottawa is the capital city of Canada. It is located in the southern portion of the province of Ontario, at the confluence of the Ottawa River and the Rideau ... Shopify Capital offers small business funding in the form of merchant cash advances to eligible merchants in Canada. If you live in Canada and need ... Download Capital One Canada and enjoy it on your iPhone, iPad and iPod touch. ... Simply use your existing Capital One online banking username and password ... A leader in the alternative asset space, TPG was built for a distinctive approach, managing assets through a principled focus on innovation. We're Canada's largest credit union by membership because we prioritize people, not profits. Let's build the right plan to reach your financial goals, together. The national capital is Ottawa, Canada's fourth largest city. It lies some 250 miles (400 km) northeast of Toronto and 125 miles (200 km) west of Montreal, ... Finding Value Across the Capital Structure: Limited Recourse Capital Notes. Limited Recourse Capital Notes are an evolving segment of the Canadian fixed-income ...\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
      "Final Answer: Ottawa is the capital of Canada.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'Ottawa is the capital of Canada.'"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response = agent({\"input\":\"What is the capital of canada?\"})\n",
    "response['output']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7db969cf",
   "metadata": {},
   "source": [
    "Great, that's correct. Now let's ask a question that requires listing the currect directory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "8e516015",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m I need to find out what the comments are about\n",
      "Action: Search\n",
      "Action Input: \"comments on https://news.ycombinator.com/item?id=34425779\"\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3mAbout a month after we started Y Combinator we came up with the phrase that ... Action Input: \"comments on https://news.ycombinator.com/item?id=34425779\" .\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3m I now know the comments are about Y Combinator\n",
      "Final Answer: The comments on the webpage are about Y Combinator.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'The comments on the webpage are about Y Combinator.'"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response = agent({\"input\":\"Tell me what the comments are about on this webpage https://news.ycombinator.com/item?id=34425779\"})\n",
    "response['output']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad644451",
   "metadata": {},
   "source": [
    "## FIN\n",
    "\n",
    "Wow! You made it all the way down to the bottom.\n",
    "\n",
    "Where do you go from here?\n",
    "\n",
    "The world of AI is massive and use cases will continue to grow. I'm personally most excited about the idea of use cases we don't know about yet.\n",
    "\n",
    "What else should we add to this list?\n",
    "\n",
    "Check out this [repo's ReadMe](https://github.com/gkamradt/langchain-tutorials) for more inspiration\n",
    "Check out more tutorials on [YouTube](https://www.youtube.com/@DataIndependent)\n",
    "\n",
    "I'd love to see what projects you build. Tag me on [Twitter](https://twitter.com/GregKamradt)!\n",
    "\n",
    "Have something you'd like to edit? See our [contribution guide](https://github.com/gkamradt/langchain-tutorials) and throw up a PR"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
